REINFORCEMENT LEARNING IN THE JOINT SPACE: VALUE ITERATION IN WORLDS WITH CONTINUOUS STATES AND ACTIONS by

نویسندگان

Christopher Kenneth Monson

Todd S. Peterson

Michael A. Goodrich

Michael D. Jones

BRIGHAM YOUNG

Kenneth Monson

David W. Embley

David Wingate

چکیده

REINFORCEMENT LEARNING IN THE JOINT SPACE: VALUE ITERATION IN WORLDS WITH CONTINUOUS STATES AND ACTIONS Christopher Kenneth Monson Department of Computer Science Master of Science Continuous space reinforcement learning algorithms frequently fail to address the possibility of a continuous action space, presumably because of the difficulty of discovering the best action for a particular state. This can, in some cases, severely limit the ability of a learning algorithm to tackle some common problems where different portions of the state space require distinct action granularity. Näıve action discretization does not suffice for problems of this nature, so traditional reinforcement approaches that consider only the continuous state space fail to solve these kinds of problems. JoSTLe (Joint Space Triangulation Learner) addresses the need for a reinforcement learning approach that can handle a continuous action space by means of intelligent discretization. It employs the variable resolution discretization techniques of Muños and Moore [MM02], but in an augmented “joint” space, one that includes actions as well as states. The algorithm is shown to work on a problem that requires the treatment of a continuous action space, as well as one that does not. The efficacy of the algorithm as well as its sensitivity to parameter tuning are shown through mathematical arguments and experimental data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis

We introduce new, efficient algorithms for value iteration with multiple reward functions and continuous state. We also give an algorithm for finding the set of all nondominated actions in the continuous state setting. This novel extension is appropriate for environments with continuous or finely discretized states where generalization is required, as is the case for data analysis of randomized...

متن کامل

Learning control under uncertainty: A probabilistic Value-Iteration approach

In this paper, we introduce a probabilistic version of the wellstudied Value-Iteration approach, i.e. Probabilistic Value-Iteration (PVI). The PVI approach can handle continuous states and actions in an episodic Reinforcement Learning (RL) setting, while using Gaussian Processes to model the state uncertainties. We further show, how the approach can be efficiently realized making it suitable fo...

متن کامل

Continuous-State Reinforcement Learning with Fuzzy Approximation

Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. There exist several convergent and consistent RL algorithms which have been intensively studied. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed i...

متن کامل

Reinforcement Using Supervised Learning for Policy Generalization

Applying reinforcement learning in large Markov Decision Process (MDP) is an important issue for solving very large problems. Since the exact resolution is often intractable, many approaches have been proposed to approximate the value function (for example, TD-Gammon (Tesauro 1995)) or to approximate directly the policy by gradient methods (Russell & Norvig 2002). Such approaches provide a poli...

متن کامل

Model-Based Reinforcement Learning with Continuous States and Actions

Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the tran...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

REINFORCEMENT LEARNING IN THE JOINT SPACE: VALUE ITERATION IN WORLDS WITH CONTINUOUS STATES AND ACTIONS by

نویسندگان

چکیده

منابع مشابه

Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis

Learning control under uncertainty: A probabilistic Value-Iteration approach

Continuous-State Reinforcement Learning with Fuzzy Approximation

Reinforcement Using Supervised Learning for Policy Generalization

Model-Based Reinforcement Learning with Continuous States and Actions

عنوان ژورنال:

اشتراک گذاری